arXiv:1506.05981vl [cs.DS] 19Jun2015 


On Euclid’s Algorithm and Elementary Number Theory 

Roland Backhouse, Joao F. Ferreira 1 

School of Computer Science, University of Nottingham, Nottingham, NG8 IBB, England 


Abstract 

Algorithms can be used to prove and to discover new theorems. This pa¬ 
per shows how algorithmic skills in general, and the notion of invariance in 
particular, can be used to derive many results from Euclid’s algorithm. We 
illustrate how to use the algorithm as a verification interface (i.e., how to 
verify theorems) and as a construction interface (i.e., how to investigate and 
derive new theorems). 

The theorems that we verify are well-known and most of them are in¬ 
cluded in standard number-theory books. The new results concern clistribu- 
tivity properties of the greatest common divisor and a new algorithm for 
efficiently enumerating the positive rationals in two different ways. One way 
is known and is due to Moshe Newman. The second is new and corresponds 
to a deforestation of the Stern-Brocot tree of rationals. We show that both 
enumerations stem from the same simple algorithm. In this way, we con¬ 
struct a Stern-Brocot enumeration algorithm with the same time and space 
complexity as Newman’s algorithm. A short review of the original papers by 
Stern and Brocot is also included. 
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1. Introduction 


An algorithm is a sequence of instructions that can be systematically ex¬ 
ecuted in the solution of a given problem. Algorithms have been studied 
and developed since the beginning of civilisation, but, over the last 50 years, 
the unprecedented scale of programming problems and the consequent de¬ 
mands on precision and concision have made computer scientists hone their 
algorithmic problem-solving skills to a fine degree. 

Even so, and although much of mathematics is algorithmic in nature, the 
skills needed to formulate and solve algorithmic problems do not form an 
integral part of contemporary mathematics education; also, the teaching of 
computer-related topics at pre-university level focuses on enabling students 
to be effective users of information technology, rather than equip them with 
the skills to develop new applications or to solve new problems. 

A blatant example is the conventional treatment of Euclid’s algorithm to 
compute the greatest common divisor (gcd) of two positive natural numbers, 
the oldest nontrivial algorithm that involves iteration and that has not been 
superseded by algebraic methods. (For a modern paraphrase of Euclid’s 
original statement, see jl], pp. 335-336].) Most books on number theory 
include Euclid’s algorithm, but rarely use the algorithm directly to reason 
about properties of numbers. Moreover, the presentation of the algorithm 
in such books has benefited little from the advances that have been made in 
our understanding of the basic principles of algorithm development. In an 
article such as this one, it is of course not the place to rewrite mathematics 
textbooks. Nevertheless, our goal in this paper is to demonstrate how a 
focus on algorithmic method can enrich and re-invigorate the teaching of 
mathematics. We use Euclid’s algorithm to derive both old and well-known, 
and new and previously unknown, properties of the greatest common divisor 
and rational numbers. The leitmotiv is the notion of a loop invariant — how 
it can be used as a verification interface (i.e., how to verify theorems) and as 
a construction interface (i.e., how to investigate and derive new theorems). 

We begin the paper in section [2] with basic properties of the division rela¬ 
tion and the construction of Euclid’s algorithm from its formal specification. 
In contrast to standard presentations of the algorithm, which typically as¬ 
sume the existence of the gcd operator with specific algebraic properties, our 
derivation gives a constructive proof of the existence of an inhmum operator 
in the division ordering of natural numbers. 

The focus of section [3] is the systematic use of invariant properties of 
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Euclid’s algorithm to verify known identities. Section SI on the other hand, 
shows how to use the algorithm to derive new results related with the great¬ 
est common divisor: we calculate sufficient conditions for a natural-valued 
functior@ to distribute over the greatest common divisor, and we derive an ef¬ 
ficient algorithm to enumerate the positive rational numbers in two different 
ways. 

Although the identities in section [3] are well-known, we believe that our 
derivations improve considerably on standard presentations. One example is 
the proof that the greatest common divisor of two numbers is a linear combi¬ 
nation of the numbers; by the simple device of introducing matrix arithmetic 
into Euclid’s algorithm, it suffices to observe that matrix multiplication is 
associative in order to prove the theorem. This exemplifies the gains in our 
problem-solving skills that can be achieved by the right combination of pre¬ 
cision and concision. The introduction of matrix arithmetic at this early 
stage was also what enabled us to derive a previously unknown algorithm to 
enumerate the rationals in so-called Stern-Brocot order (see section [4]), which 
is the primary novel result (as opposed to method) in this paper. 

Included in the appendix is a brief summary of the work of Stern and 
Brocot, the 19th century authors after whom the Stern-Brocot tree is named. 
It is interesting to review their work, particularly that of Brocot, because it is 
clearly motivated by practical, algorithmic problems. The review of Stern’s 
paper is included in order to resolve recent misunderstandings about the 
origin of the Eisenstein-Stern and Stern-Brocot enumerations of the rationals. 

2. Divisibility Theory 

Division is one of the most important concepts in number theory. This 
section begins with a short, basic account of the division relation. We ob¬ 
serve that division is a partial ordering on the natural numbers and pose 
the question whether the inffinum, in the division ordering, of any pair of 
numbers exists. The algorithm we know as Euclid’s gcd algorithm is then 
derived in order to give a positive (constructive) answer to this question. 


2 We call a function natural-valued if its range is the set of natural numbers. 
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2.1. Division Relation 

The division relation, here denoted by an infix “\” symbol, is the relation 
on integers defined to be the converse of the “is-a-multiple-of” relation^: 

m\n = (3k : kETL : n = kxm) ] 

In words, an integer m divides an integer n (or n is divisible by m) if there 
exists some integer k such that n — kxm. In that case, we say that m is a 
divisor of n and that n is a multiple of m. 

The division relation plays a prominent role in number theory. So, we 
start by presenting some of its basic properties and their relation to addition 
and multiplication. First, it is reflexive because multiplication has a unit (i.e., 
m=lxm) and it is transitive, since multiplication is associative. It is also 
(almost) preserved by linear combination because multiplication distributes 
over addition: 

(1) [ k\x A k\y = k\(x + axy) A k\y ] . 

(We leave the reader to verify this law; take care to note the use of the 
distributivity of multiplication over addition in its proof.) Reflexivity and 
transitivity make division a preorder on the integers. It is not anti-symmetric 
but the numbers equivalent under the preordering are given by 

[ m\n A n\m = abs.m = abs.n ] , 


3 The square so-called “everywhere” brackets are used to indicate that a boolean state¬ 
ment is “everywhere” true. That is, the statement has the value true for all instantiations 
of its free variables. Such statements are often called “facts”, or “laws”, or “theorems”. 

When using the everywhere brackets, the domain of the free variables has to be made 
clear. This is particularly important here because sometimes the domain of a variable is 
the integers and sometimes it is the natural numbers. Usually, we rely on a convention for 
naming the variables, but sometimes we preface a law with a reminder of the domain. 

Also, we use a systematic notation for quantified expressions which has the form 
{© bv:range:terrri). There are five components to the notation, which we explain in 
turn. The first component is the quantifier, in this case The second component is the 
dummy bv. The third component is the range of the dummy, a boolean-valued expression 
that determines a set of values of the dummy. The fourth component is the term. The 
final component of the notation is the angle brackets' they serve to delimit the scope of 
the dummy. For more details, see jl, chapter 11] and chapter 8]. 

The symbol = denotes boolean equality. In continued expressions = is read associatively 
and = is read conjunctionally. For example, p = q = r is evaluated associatively —i.e. as 
(p = q)~r or p={q = r ), whichever is most convenient— whereas p = q = r is evaluated 
conjunctionally- i.e. p — q and q = r. 
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where abs is the absolute value function and the infix dot denotes function 
application. Each equivalence class thus consists of a natural number and its 
negation. If the division relation is restricted to natural numbers, division 
becomes anti-symmetric, since abs is the identity function on natural num¬ 
bers. This means that, restricted to the natural numbers, division is a partial 
order with 0 as the greatest element and 1 as the smallest element. 

2.1.1. Infimum in the Division Ordering 

The Erst question that we consider is whether two arbitrary natural num¬ 
bers m and n have an infimum in the division ordering. That is, can we solve 
the following equation^? 

(2) x\\ (Vfc :: k\m A k\n = k\x) . 

The answer is not immediately obvious because the division ordering is par¬ 
tial. (With respect to a total ordering, the infimum of two numbers is their 
minimum; it is thus equal to one of them and can be easily computed by a 
case analysis.) 

If a solution to ([2]) exists, it is unique (because the division relation on 
natural numbers is reflexive and anti-symmetric). When it does have a solu¬ 
tion, we denote it by mVn. That is, provided it can be established that (J2j) 
has a solution, 

(3) [ k\m A k\n = k \ (m\7n) ] 

Because conjunction is idempotent, 

[ k\m A k\m = k\m } 

That is, m solves ([2]) when m and n are equal. Also, because [ k\ 0 ], 

[ k\m A k \0 = k\m ] 

That is, m solves (J2J) when n is 0. So, mVm exists as does mVO, and both 
equal m: 

(4) [ mVm = mV 0 — m] 


4 Unless indicated otherwise, the domain of all variables is IN, the set of natural numbers. 
Note that we include 0 in IN. The notation x :: E means that x is the unknown and the 
other free variables are parameters of the equation E. 
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Other properties that are easy to establish by exploiting the algebraic prop¬ 
erties of conjunction are, first, V is symmetric (because conjunction is sym¬ 
metric) 

(5) [ mVn = nVm ] , 

and, second, V is associative (because conjunction is associative) 

(6) [ (mVn)Vp = mV(nVp) ] 

Note that we choose infix notation for V, since it allows us to write mVnVp 
without having to choose between (mVn)Vp or mV(nVp). 

The final property of V that we deduce from (J3]) is obtained by exploiting 
©, with x and y replaced by m and n, respectively : 

(7) [ (m + axn)Vn = mVn ] 

2.2. Constructing Euclid’s Algorithm 

At this stage in onr analysis, properties (J5]), (J6]) and (JTJ) assume that 
equation ([2]) has a solution in the appropriate cases. For instance, ([5]) means 
that, if ([2D has a solution for certain natural numbers m and n, it also has a 
solution when the values of m and n are interchanged. 

In view of properties ()4]) and ([5]), it remains to show that (J2]) has a 
solution when both m and n are strictly positive and unequal. We do this by 
providing an algorithm that computes the solution. Equation (J2]) does not 
directly suggest any algorithm, but the germ of an algorithm is suggested by 
observing that it is equivalent to 

(8) x, yv. x = y A (Vfc:: k\m A k\n = k\x A k\y) 

This new shape strongly suggests an algorithm that, initially, establishes the 
truth of 

(Vfc:: k\m A k\n = k\x A k\y ) 

—which is trivially achieved by the assignment x,y := m,n — and then, 
reduces x and y in such a way that the property is kept invariant whilst 
making progress to a state satisfying x = y. When such a state is reached, 
we have found a solution to the equation (jgj) , and the value of x (or y since 
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they are equal) is a solution of Oh Thus, the structure of the algorithm we 
are trying to develop is as follow^: 

{ 0 < m A 0 <7i } 
x , y := m , n ; 

{ Invariant: (V/c:: k\m A k\n = k\x A k\y) } 
do x^y x ,y := A , B 

od 

{ x = y A (Vfc:: k\m A k\n = k\x A k\y) } 

Now we only have to define A and B in such a way that the assignment in 
the loop body leads to a state where x = y is satisfied while maintaining the 
invariant. Exploiting the transitivity of equality, the invariant is maintained 
by choosing A and B so that 

(9) (\/k:: k\x A k\y = k\A A k\B) . 

To ensure that we are making progress towards the termination condition, 
we have to define a bound function , which is a natural-valued function of 
the variables x and y that measures the size of the problem to be solved. A 
guarantee that the value of such a bound function is always decreased at each 
iteration is a guarantee that the number of times the loop body is executed is 
at most the initial value of the bound function. The definition of the bound 
function depends on the assignments we choose for A and B. 

At this point, we need to exploit properties specific to division. (Refer 
back to section 12.11 for a discussion of some of the properties.) Inspecting 
the shape of (JHD, we see that it is similar to the shape of property (P). This 
suggests that we can use (jT|) , and in fact, considering this property, we have 
the corollary: 

(10) [ k\x A k\y = k\{x—y ) A k\y ] . 


5 We use the Guarded Command Language (GCL), a very simple programming language 
with just four programming constructs—assignment, sequential composition, conditionals, 
and loops. The GCL was introduced by Dijkstra [4|. The statement do S' od is a loop 
that executes S repeatedly while at least one of S’s guards is true. Expressions in curly 
brackets are assertions. 
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The relevance of this corollary is that our invariant is preserved by the as¬ 
signment x := x—y (leaving the value of y unchanged). (Compare (1IU1) with 
(J9]).) Note that this also reduces the value of x when y is positive. This 
suggests that we strengthen the invariant by requiring that x and y remain 
positive; the assignment x := x—y is executed when x is greater than y and, 
symmetrically, the assignment y := y—x is executed when y is greater than 
x. As bound function we can take x+y. The algorithm becomes 

{ 0 < m A 0 < n } 
x , y := rn , n ; 

{ Invariant: 0<x A 0 <y A (Vfc:: k\m A k\n = k\x A k\y) 

Bound function: x+y } 

do x 7 -y —y 

if y < x —* x \= x—y 
□ x < y —y y : = y—x 

fi 

od 

{ 0<x A 0 <y A x — y A (Vfc:: k\m A k\n = k\x A k\y) } 

(We leave the reader to perform the standard steps used to verify the cor¬ 
rectness of the algorithm.) Finally, since 

(x<y V y<x) = x^y , 

we can safely remove the outer guard and simplify the algorithm, as shown 
below. 

{ 0<m A 0 < n } 
x , y := m , n ; 

{ Invariant: 0<x A 0 <y A (Vfc:: k\m A k\n = k\x A k\y) 

Bound function: x+y } 

do y < x —» x := x—y 
□ x <y —y y := y—x 
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od 

{ 0<x A 0 <y A x — y A (Vfc:: k\m A k\n = k\x A k\y) } 

The algorithm that we have constructed is Euclid’s algorithm for computing 
the greatest common divisor of two positive natural numbers, the oldest non¬ 
trivial algorithm that has survived to the present day! (Please note that our 
formulation of the algorithm differs from Euclid’s original version and from 
most versions found in number-theory books. While they use the property 
[ mVn = nV(mmodn) ], we use urn i.e., [ mVn = ( m—nY\Zn ]. For 
an encyclopedic account of Euclid’s algorithm, we recommend [lj, p. 334].) 

2.3. Greatest Common Divisor 

In section l2.1.1l we described the problem we were tackling as establishing 
that the inhmum of two natural numbers under the division ordering always 
exists; it was only at the end of the section that we announced that the algo¬ 
rithm we had derived is an algorithm for determining the greatest common 
divisor. This was done deliberately in order to avoid the confusion that can 
—and does— occur when using the words “greatest common divisor”. In 
this section, we clarify the issue in some detail. 

Confusion and ambiguity occur when a set can be ordered in two differ¬ 
ent ways. The natural numbers can be ordered by the usual size ordering 
(denoted by the symbol <), but they can also be ordered by the division rela¬ 
tion. When the ordering is not made explicit (for instance, when referring to 
the “least” or “greatest” of a set of numbers), we might normally understand 
the size ordering, but the division ordering might be meant, depending on 
the context. 

In words, the infim n m of two values in a partial ordering —if it exists— 
is the largest value (with respect to the ordering) that is at most both values 
(with respect to the ordering). The terminology “greatest lower bound” is 
often used instead of “infimum”. Of course, “greatest” here is with respect 
to the partial ordering in question. Thus, the inhmum (or greatest lower 
bound) of two numbers with respect to the division ordering —if it exists— 
is the largest number with respect to the division ordering that divides both 
of the numbers. Since, for strictly positive numbers, “largest with respect 
to the division ordering” implies “largest with respect to the size ordering” 
(equally, the division relation, restricted to strictly positive numbers, is a 
subset of the < relation), the “largest number with respect to the division 
ordering that divides both of the numbers” is the same, for strictly positive 
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numbers , as the “largest number with respect to the size ordering that divides 
both of the numbers”. Both these expressions may thus be abbreviated to 
the “greatest common divisor” of the numbers, with no problems caused by 
the ambiguity in the meaning of “greatest” — when the numbers are strictly 
positive. Ambiguity does occur, however, when the number 0 is included, 
because 0 is the largest number with respect to the division ordering, but the 
smallest number with respect to the size ordering. If “greatest” is taken to 
mean with respect to the division ordering on numbers, the greatest common 
divisor of 0 and 0 is simply 0. If, however, “greatest” is taken to mean with 
respect to the size ordering, there is no greatest common divisor of 0 and 0. 
This would mean that the gcd operator is no longer idempotent, since 0V0 
is undefined, and it is no longer associative, since, for positive m, (mVO)VO 
is well-defined whilst mV(OVO) is not. 

Concrete evidence of the confusion in the standard mathematics literature 
is easy to find. We looked up the definition of greatest common divisor in 
three commonly used undergraduate mathematics texts, and found three 
non-equivalent definitions. The first [^, p. 30] defines “greatest” to mean 
with respect to the divides relation (as, in our view, it should be defined); 
the second (Q, p. 21, def. 2.2] defines “greatest” to mean with respect to 
the < relation (and requires that at least one of the numbers be non-zero). 
The third text [?;, p. 78] excludes zero altogether, defining the greatest 
common divisor of strictly positive numbers as the generator of all linear 
combinations of the given numbers; the accompanying explanation (in words) 
of the terminology replaces “greatest” by “largest” but does not clarify with 
respect to which ordering the “largest” is to be determined. 

Now that we know that V is the greatest common divisor, we could 
change the operator to gcd, i.e., replace mVn by m gcd n. However, we stick 
to the “V” notation because it makes the formulae shorter, and, so, easier 
to read. We also use “A” to denote the least common multiple operator. 
To remember which is which, just remember that inkma ( lower bounds) 
are indicated by downward- pointing symbols (eg. j. for minimum, and V for 
disjunction) and suprema ( upper bounds) by upward-pointing symbols. 

3. Euclid’s Algorithm as a Verification Interface 

In this section we show how algorithms and the notion of invariance can 
be used to prove theorems. In particular, we show that the exploitation of 
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Euclid’s algorithm makes proofs related with the greatest common divisor 
simple and more systematic than the traditional ones. 

There is a clear pattern in all our calculations: everytime we need to prove 
a new theorem involving V, we construct an invariant that is valid initially 
(with x , y := m ,n) and that corresponds to the theorem to be proved 
upon termination (with x = y = mX7n). Alternatively, we can construct an 
invariant that is valid on termination (with x = y = m\7n) and whose initial 
value corresponds to the theorem to be proved. The invariant in section 13.31 
is such an example. Then, it remains to prove that the chosen invariant is 
valid after each iteration of the repeatable statement. 

We start with a minor change in the invariant that allows us to prove some 
well-known properties. Then, we explore how the shape of the theorems to 
be proved determine the shape of the invariant. We also show how to prove 
a geometrical property of V. 

3.1. Exploring the invariant 

The invariant that we use in sect ion rests on the validity of the theorem 

[ k\jn A k\n = k\(m—n ) A k\n } 

But, as Van Gasteren observed in (§, Chapter 11], we can use the more 
general and equally valid theorem 

[ k \ ( exm ) A k \ {cxn) = k\(cx (m—n)) A k \ (cxn) ] 

to conclude that the following property is an invariant of Euclid’s algorithm: 

(' Vk,c:: k\(cxm) A k\(cxn ) = k\(cxx ) A k\(cxy)) . 

In particular, the property is true on termination of the algorithm, at which 
point x and y both equal mVn. That is, for all m and n, such that 0 < m 
and 0 < 7i, 

(11) [ k\{cxm) A k\{cxn) = k\{cx(mS7n )) ] . 

In addition, theorem (fill) holds when m < 0, since 

[ (-m)Vn = mVn ] A [ k\(cx(—m)) = k\(cX7n ) ] , 

and it holds when m equals 0, since [ k\ 0 ]. Hence, using the symmetry 
between m and n we conclude that (HID is indeed valid for all integers m 
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and n. (In Van Gasteren’s presentation, this theorem only holds for all 

(m,n) ± ( 0 , 0 ).) 

Theorem (ITT]) can be used to prove a number of properties of the greatest 
common divisor. If, for instance, we replace k by m, we have 

[ m\(cxn) = m\(cx(mVn)) ] , 

and, as a consequence, we also have 

(12) [ (m\(cxn) = m\c ) -<= mVn = l ] 

More commonly, (fT 2 j) is formulated as the weaker 

[ m\c -4= mVn = l A m\(cxn ) ] , 

and is known as Euclid’s Lemma. Another significant property is 

(13) [ fc\(cx(mVfi)) = k \ ((cxm)V(cxn)) ] , 

which can be proved as: 

k\(cx ( m\7n )) 

= { (ED } 

k\(cxm ) A k\(cxn) 

= { 0 } 

k\ ((cxm)V(cxn)) . 

From (USD we conclude 

(14) [ (cxm)V(cxn) — c x (mVn) ] 

Property m states that multiplication by a natural number distributes 
over V. It is an important property that can be used to simplify arguments 
where both multiplication and the greatest common divisor are involved. An 
example is Van Gasteren’s proof of the theorem 

(15) [ (mxp) Vn = mVn •<= p\7n = 1 ] , 

which is as follows: 
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mVn 


= { p\7n = 1 and 1 is the unit of multiplication } 

(mx(j)Vn))Vri 

= { o } 

(■ mxp ) V ( mxn ) V n 
= { (mxn)Vn = n } 

(mxp)Vn . 

V on h-'ft side 

In the previous sections, we have derived a number of properties of the 
V operator. However, where the divides relation is involved, the operator 
always occurs on the right side of the relation. (For examples, see (J3]) and 
(TOO Now we consider properties where the operator is on the left side of a 
divides relation. Our goal is to show that 

(16) [ (mVn) \ k = (3a, b:: k = mxa + nxb ) ] , 

where the range of a and b is the integers. 

Of course, if (1T61) is indeed true, then it is also true when k equals mVn. 
That is, a consequence of (ITBj) is 

(17) [ (3a, b:: mX7n = mxa + nxb) ] 

In words, mVn is a linear combination of m and n. For example, 

3V5 = 1 = 3x2 — 5x1 = 5x2 — 3x3 . 

Vice-versa, if (TTT1) is indeed true then (HBD is a consequence. (The crucial 
fact is that multiplication distributes through addition.) It thus suffices to 
prove (1171) . 

We can establish (1X7)) by constructing such a linear combination for given 
values of m and n. 

When n is 0, we have 

mVO — m — mxl + Oxl . 

(The multiple of 0 is arbitrarily chosen to be 1.) 
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When both m and n are non-zero, we need to augment Euclid’s algorithm 
with a computation of the coefficients. The most effective way to establish 
the property is to establish that x and y are linear combinations of m and n is 
an invariant of the algorithm; this is best expressed using matrix arithmetic. 

In the algorithm below, the assignments to x and y have been replaced 
by equivalent assignments to the vector (x y). Also, an additional variable 
C, whose value is a 2x2 matrix of integers has been introduced into the 
program. Specifically, I, A and B are 2x2 matrices; I is the identity matrix 
(p , A is the matrix ( ^ and B is the matrix ~ 1 ). (The assignment 
(x y) := (x y)x A is equivalent to x , y := x—y , y, as can be easily checked.) 

{ 0 < m A 0 < n } 

(x y ), C := (m n ), I ; 

{ Invariant: (x y) = (m n) x C } 
do |/<x A (x i/),C := (x y) x A , CxA 
□ x < y —)■ (x y ), C := (x y) x B , CxB 
od 

{ (x y) = (mVn mVn) = (m n) x C } 

The invariant shows only the relation between the vectors (x y) and (m n); 
in words, (x y) is a multiple of (m n). 

It is straightforward to verify that the invariant is established by the 
initialising assignment, and maintained by the loop body. Crucial to the 
proof that it is maintained by the loop body is that multiplication (here of 
matrices) is associative. Had we expressed the assignments to C in terms of 
its four elements, verifying that the invariant is maintained by the loop body 
would have amounted to giving in detail the proof that matrix multiplication 
is associative. This is a pointless duplication of effort, avoiding which fully 
justifies the excursion into matrix arithmetic. 

(An exercise for the reader is to express the property that m and n are 
linear combinations of x and y. The solution involves observing that A and 
B are invertible. This will be exploited in section l4~2l f 

3.3. A geometrical property 

In this section, we prove that in a Cartesian coordinate system, mVn 
can be interpreted as the number of points with integral coordinates on the 
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straight line joining the points (0,0) and (■ m,n ), excluding (0,0). Formally, 
with dummies s and t ranging over integers, we prove for all m and n: 

(£s,f : mxt— nxs A s< m A t < n A (0 < s V 0 < t) : 1) 

( 18 ) = 

mVn . 

We begin by observing that (fT 8 ]l holds when m = 0 or when n = 0 (we leave 
the proof to the reader). When 0 < m and 0 < n, we can simplify the range 
of (fTHp . First, we observe that 

(0 < s < m = 0 < f < n) <(= mxt = nxs , 


since 


0 < t < n 

= { 0 < m } 

0 < mxt < mxn 
= { mxt = nxs } 

0 <nxs < mxn 

= { 0 < n, cancellation } 

0 < s < m . 

As a result, (USD can be written as 

(19) [ (Es,t : mxt = nxs A 0 < t < n : l) = mVn ] 

In order to use Euclid’s algorithm, we need to find an invariant that allows 
us to conclude (fl9|) . If we use as invariant 

(20) (£s,f : xxt = yxs A 0 < t < y : 1 )=xVy , 
its initial value is the property that we want to prove: 

(Es,t : mxt — nxs A 0 <t <n : l)=mVn . 

Its value upon termination is 

(Es,t : (mVn) xf = (mVn)xs A 0 < K mVn : 1) = (mVn)V(mVn) , 
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which is equivalent (by cancellation of multiplication and idempotence of V) 
to 


(£s, t : i = s A 0<t<mVn : 1 )=mVn . 

It is easy to see that the invariant reduces to true on termination (because 
the sum on the left equals mVn), making its initial value also true. 

It is also easy to see that the righthand side of the invariant is unnecessary 
as it is the same initially and on termination. This motivates the generali¬ 
sation of the concept “invariant”. “Invariants” in the literature are always 
boolean-valued functions of the program variables, but we see no reason why 
“invariants” shouldn’t be of any type: for us, an invariant of a loop is simply 
a function of the program variables whose value is unchanged by execution 
of the loop bodj|§. In this case, the value is a natural number. Therefore, we 
can simplify (120|) and use as invariant 

(21) (£s, t : xxt = yxs A 0 < t < y : 1) 

Its value on termination is 

(Es,t : (mVn)xl = (mVn)xs A 0 <t <mVn : 1) , 

which is equivalent to 

(Es,t : t — s A 0 < t < mX7n : 1) 

As said above, this sum equals mVn. 

Now, since the invariant (l2Tj) equals the lcfthand side of (TT9|) for the initial 
values of x and y, we only have to check if it remains constant after each 
iteration. This means that we have to prove (for y < x A 0 < y): 

(Es,t : xxt — yxs A 0 < t < y : 1) 

= (Es,t : ( x—y)xt = yxs A 0 <t<y : 1) , 


6 Some caution is needed here because our more general use of the word “invariant” 
does not completely coincide with its standard usage for boolean-valued functions. The 
standard meaning of an invariant of a statement S' is a boolean-valued function of the 
program variables which, in the case that the function evaluates to true, remains true after 
execution of S. Our usage requires that, if the function evaluates to false before execution 
of S, it continues to evaluate to false after executing S. 


16 



which can be rewritten, for positive x and y, as: 

(Es, £ : (x+y) xt = yxs A 0 < £ < y : 1) 

= (Es, £ : xxt — yxs A 0 < £ < y : 1) . 

The proof is as follows: 

(Es, £ : (x+y) xt = yxs A 0 < £ < y : 1) 

= { distributivity and cancellation } 

(Es, £ : xxt — yx(s-t) A 0 <t<y : 1) 

= { range translation: s := s+t } 

(Es, £ : xxt — yxs A 0 < £ < y : 1) . 

Note that the simplification done in (fTUjl allows us to apply the range trans¬ 
lation rule in the last step without having to relate the range of variable s 
with the possible values for variable £. 

4. Euclid’s Algorithm as a Construction Interface 

In this section we show how to use Euclid’s algorithm to derive new 
theorems related with the greatest common divisor. We start by calculating 
reasonable sufficient conditions for a natural-valued function to distribute 
over the greatest common divisor. We also derive an efficient algorithm for 
enumerating the positive rational numbers in two different ways. 

4-1. Distributivity properties 

In addition to multiplication by a natural number, there are other func¬ 
tions that distribute over V. The goal of this subsection is to determine 
sufficient conditions for a natural-valued function / to distribute over V, 
i.e., for the following property to hold: 

(22) [ /.(mVn) = /.mV f.n ) 

For simplicity’s sake, we restrict all variables to natural numbers. This im¬ 
plies that the domain of / is also restricted to the natural numbers. 

We explore (|22|) by identifying invariants of Euclid’s algorithm involving 
the function /. To determine an appropriate loop invariant, we take the 
right-hand side of (122|) and calculate: 
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f.m V f.n 

= { the initial values of x and y are m and n, respectively } 

f-x V f.y 

= { suppose that f.x V f.y is invariant; 

on termination: x — mS/n A y — mS/n } 

/.(mVn) V /.(mVn) 

= { V is idempotent } 

/.(mVn) . 

Property fl22l) is thus established under the assumption that f.xVf.y is an 
invariant of the loop body. (Please note that this invariant is of the more 
general form introduced in section [3T3l ") 

The next step is to determine what condition on / guarantees that /.iV f.y 
is indeed invariant. Noting the symmetry in the loop body between x and y, 
the condition is easily calculated to be 

[ f-( x ~y) V f-y — f-x V f.y 4= 0 <y<x } . 

Equivalently, by the rule of range translation (x := x+y), the condition can 
be written as 

(23) [ f.x V f.y = f.(x+y) V f.y ^ 0 < x A 0 < y ] . 

Formally, this means that 

“ / distributes over V ” 4= (J23J) . 

Incidentally, the converse of this property is also valid: 

dUD 4= “ / distributes over V ” . 

The simple calculation proceeds as follows: 
f.(x+y) V f.y 

= { f distributes over V } 

f.((x+y)Vy) 
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= { (ED } 

f.(xVy) 

= { / distributes over V } 

f.x V f.y . 

By mutual implication we conclude that 
“ / distributes over V ” = (|23D . 

We have now reached a point where we can determine if a function distributes 
over V. However, since (I23|) still has two occurrences of V, we want to refine 
it into simpler properties. Towards that end we turn our attention to the 
condition 

f.x V f.y = f.{x+y) V f.y , 

and we explore simple ways of guaranteeing that it is everywhere true. For 
instance, it is immediately obvious that any function that distributes over 
addition distributes over V. (Note that multiplication by a natural number 
is such a function.) The proof is very simple: 

f.(x+y)Vf.y 

= { f distributes over addition } 

(f-x+f.y) V f.y 

= { m } 

f.x V f.y . 

In view of properties (ED and (fT5lh we formulate the following lemma, which 
is a more general requirement: 

Lemma 24. All functions / that satisfy 

(Vx,r/:: (3a, b : aVf.y = 1 : ffx+y) = a x f.x + b x f.y)) 

distribute over V. 

Proof 
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f.(x+y)V f.y 

= { f-(x+y) = a x f.x + b x f.y } 

(a x f.x + b x f.y ) V f.y 

= { m } 

(a x /.x) V f.y 

= { a V f.y = 1 and (TT5|) } 

f.x V f.y . 

□ 

Note that since the discussion above is based on Euclid’s algorithm, lemma 
[2H only applies to positive arguments. We now investigate the case where m 
or n is 0. We have, for rn — 0 : 

/•(OVn) = /.OV/.n 
= { [ 0 Vm = m ] } 

f.n = /.OV/.n 

= { [ a\b = a = bVa } } 

f.n \ f.0 

{ obvious possibilities that make the expression valid 
are /.0 = 0 , f.n = 1 , or f.n = f. 0 ; the first is the 
interesting case } 

/• 0 = 0 . 

Hence, using the symmetry between m and n we have, for m = 0 or n = 0: 
(25) /.(mVn) = f.mV f.n <= f. 0 = 0 . 

The conclusion is that we can use (1251) and le mm a 1221 to prove that a natural¬ 
valued function with domain IN distributes over V. We were unable to prove 
that the condition in lemma [24] is necessary for a function to distribute over 
V, but we do not know any function distributing over V that does not satisfy 
the condition. 
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Example 0: the Fibonacci function 

In [9j, Edsger Dijkstra proves that the Fibonacci function distributes over 
V . He does not use lemma 1241 explicitly, but he constructs the property 

(26) fib .(x+y) = fib.( 2 /—1) x fib.a; + fib.(x+l) x fib .y , 

and then, using the lemma 

fib. y V fib.( 2 /—1) = 1 , 

he concludes the proof. His calculation is the same as that in the proof of 
lemma [24] but for particular values of a and b and with / replaced by fib. 
Incidentally, if we don’t want to construct prop erty (I26|) we can easily verify 
it using induction — more details are given in [lOj. 

An interesting application of this distributivity property is to prove that 
for any positive k, every A;th number in the Fibonacci sequence is a multiple 
of the A;th number in the Fibonacci sequence. More formally, the goal is to 
prove 


fib.(nxfc) is a multiple of fib.A; , 

for positive k and natural n. A concise proof is: 

fib.(nxfc) is a multiple of fib.A; 

= { definition } 

fib.A; \ fib.(nx A:) 

= { [ a\b = a\7b = a ] , 

with a fib.A; and b i\b.(nxk) } 

fib.A; V fib.(nxA;) = fib.A; 

= { fib distributes over V } 

fib.(A:V(nxA:)) = fib. A: 

= { k\/(nxk) — k and reflexivity } 

true . 
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Example 1: the Mersenne function 

We now prove that, for all integers k and m such that 0 < k m , the function 
/ defined as 

f.m = k m -1 

distributes over V. 

First, we observe that f.O = 0. (Recall the discussion of (l25]h ) Next, we 
use lemma [24j This means that we need to find integers a and b , such that 

k m +n-i = ax(fc m —1) + bx(k n —l) A aV(k n - 1) = 1 . 

The most obvious instantiations for a are 1, k n and k n — 2. (That two consec¬ 
utive numbers are coprime follows from ([7]).) Choosing a — 1, we calculate 
b: 

k m+n _ 1 = (k m -l) +bx(k n -l) 

= { arithmetic } 

k m+n_ k m = 

= { multiplication distributes over addition } 

k m x(k n -1 ) = bx(k n -1 ) 

{ Leibniz } 
k m = b . 

We thus have 

k m+n -l = lx(k m -l) + k m x(k n -l) A lV(A: n -l) = l , 

and we use lemma [24] to conclude that / distributes over V: 

[ (fc m -l) V (k n -l) = k (mVn) -1 ] . 

In particular, the Mersenne function, which maps m to 2 m —1, distributes 
over V: 

(27) [ (2 m —1) V (2 n —1) = 2 (mVn) -l ] . 

A corollary of d 23) is the property 

[ (2 m -l)V(2 n -l) = 1 = mVn = 1] . 

In words, two numbers 2 m —1 and 2 n —l are coprime is the same as exponents 
m and n are coprime. 
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4-2. Enumerating the Rationals 

A standard theorem of mathematics is that the rationals are “denumer¬ 
able”, i.e. they can be put in one-to-one correspondence with the natural 
numbers. Another way of saying this is that it is possible to enumerate the 
rationals so that each appears exactly once. 



“not at all obvious” how to “deforest” the Stern-Brocot tree of rationals. 

In this section, we derive an efficient algorithm for enumerating the ratio¬ 
nals according to both orderings. The algorithm is based on a bijection be¬ 
tween the rationals and invertible 2x2 matrices. The key to the algorithm’s 
derivation is the reformulation of Euclid’s algorithm in terms of matrices 
(see section I3.2j) . The enumeration is efficient in the sense that it has the 
same time and space complexity as the algorithm credited to Moshe Newman 
in jl2|, albeit with a constant-fold increase in the number of variables and 
number of arithmetic operations needed at each iteration. 

Note that, in our view, it is misleading to use the name “Calkin-Wilf 
tree of rationals” because Stern [l5j] had already documented essentially the 
same structural characterisation of the rationals almost 150 years earlier 
than Calkin and Wilf. For more explanation, see the appendix in which we 
review in some detail the relevant sections of Stern’s paper. Stern attributes 
the structure to Eisenstein, so henceforth we refer to the “Eisenstein-Stern” 
tree of rationals where recent publications (including our own 0 ) would 
refer to the “Calkin-Wilf tree of rationals”. Section [7] includes background 
information. For a comprehensive account of properties of the Stern-Brocot 
tree, including further relationships with Euclid’s algorithm, see fiol . pp. 
116-118], 

4-2.1. Euclid’s Algorithm 

A positive rational in so-called “lowest form” is an ordered pair of positive, 
coprime integers. Every rational — has unique lowest-form representation 


' By an efficient enumeration we mean a method of generating each rational without 
duplication with constant cost per rational in terms of arbitrary-precision simple arithmetic 
operations. 
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• For example, | is a rational in lowest form, whereas | is the same 
rational, but not in lowest form. 

Because computing the lowest-form representation involves computing 
greatest common divisors, it seems sensible to investigate Euclid’s algorithm 
to see whether it gives insight into how to enumerate the rationals. Indeed 
it does. 

Beginning with an arbitrary pair of positive integers m and n, the algo¬ 
rithm presented in section 13.21 calculates an invertible matrix C such that 

(mVn mVn) = (m n) x C . 

It follows that 

(28) (i i) X cr‘ = (*/(„„, 7 (mv „)) . 

Because the algorithm is deterministic, positive integers m and n uniquely 
define the matrix C. That is, there is a function from pairs of positive 
integers to finite products of the matrices A and B. Recall that A is the 
matrix and B is the matrix ~ ). 

Also, because the matrices A and B are constant and invertible, C -1 is 
a finite product of the matrices A -1 and B 1 and (J28j) uniquely defines a 
rational —. We may therefore conclude that there is a bijection between the 
rationals and the finite products of the matrices A -1 and B 1 provided that 
we can show that all such products are different. 

The finite products of matrices A -1 and B 1 form a binary tree with 
root the identity matrix (the empty product). Renaming A -1 as L and B 1 
as R, the tree can be displayed with “L” indicating a left branch and “R” 
indicating a right branch. Fig. Q] displays the Erst few levels of the tree. 

That all matrices in the tree are different is proved by showing that the 
tree is a binary search tree (as formalised shortly). The key element of the 
proo{§ is that the determinants of A and B are both equal to 1 and, hence, 
the determinant of any finite product of Ls and Rs is also 1. 

Formally, we define the relation -< on matrices that are finite products of 


®The proof is an adaptation of the proof in 0 p. 117] that the rationals in the Stern- 
Brocot tree are all different. Our use of determinants corresponds to their use of “the 
fundamental fact” (4.31). Note that the definitions of L and R are swapped around in 
0 .) 
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Figure 1: Tree of Products of L and R 


Ls and Rs by 

(a c\ j (a' d\ _ a+c ^ a'+c' 

\b d) ^ \b' d'J b+d < b'+d' ‘ 

(Note that the denominator in these fractions is strictly positive; this fact is 
easily proved by induction.) We prove that, for all such matrices X, Y and 

Z, 

(29) XxLxY -< X -< XxRxZ . 

It immediately follows that there are no duplicates in the tree of matrices 
because the relation -< is clearly transitive and a subset of the inequality 
relation. (Property ([29]) formalises precisely what we mean by the tree of 
matrices forming a binary search tree: the entries are properly ordered by 
the relation -<, with matrices in the left branch being “less than” the root 
matrix which is “less than” matrices in the right branch.) 

In order to show that 

(30) XxLxY -< X 

suppose X= 0) and Y = c d ^j . Then, since L= (J ( | ) ), (j5U|) is easily 
calculated to be 

(a+c) x cl T (cxb ) T (oTc)xc / T (cx d f j oTc 

(b~\~d^ xa r T (dx6^) T (6-)-d) xp T (dxc/ / ) b-\~d 

That this is true is also a simple, albeit longer, calculation (which exploits the 
monotonicity properties of multiplication and addition); as observed earlier, 
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the key property is that the determinant of X is 1, i.e. axd — bxc = 1. The 
proof that X-<Xx R xZ is similar. 

Of course, we can also express Euclid’s algorithm in terms of transpose 
matrices. Instead of writing assignments to the vector (x y), we can write 

assignments to its transpose Noting that A and B are each other’s 

transposition, the assignment 

(x y) , C := (x y) x A , Cx A 
in the body of Euclid’s algorithm becomes 

G) - c := bx G) - bxc ■ 

Similarly, the assignment 

(x y) , C := (x y) x B , CxB 


becomes 

(;),C . A„Q ,A„C. 

On termination, the matrix C computed by the revised algorithm will of 
course be different; the pair (" r w (mvn)N ) is recovered from it by the identity 


C W “ (”/WvJ ' 

In this way, we get a second bijection between the rationals and the finite 
products of the matrices A -1 and B _1 . This is the basis for our second 
method of enumerating the rationals. 

In summary, we have: 


Theorem 31. Define the matrices L and R by 


L 



and R 



Then the following algorithm computes a bijection between the (positive) 
rationals and the finite products of L and R. Specifically, the bijection is 
given by the function that maps the rational — to the matrix D constructed 
by the algorithm together with the function from a finite product, D, of Ls 
and Rs to (1 1) x D. (The comments added to the algorithm supply the 
information needed to verify this assertion.) 
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{ 0 <m A 0 < n } 

(x y) , D := (m n) , I ; 
{ Invariant: (m rt) 
do y < x —* (& ?/) , D 
□ !<!/ A (it/),D 
od 


N)xD } 

(x y ) x L _1 , LxD 
(x ?/) x R -1 , RxD 


{ (x y) = (mVn mVn) A ( m /(m V n) 7 m ) 


(H)xD } 


Similarly, by applying the rules of matrix transposition to all expressions 
in the above, Euclid’s algorithm constructs a second bijection between the 
rationals and finite products of the matrices L and R. Specifically, the 
bijection is given by the function that maps the rational — to the matrix D 
constructed by the revised algorithm together with the function from finite 
products, D, of Ls and Rs to D x (*). 

□ 


4-2.2. Enumerating Products ofh and R 

The problem of enumerating the rationals has been transformed to the 
problem of enumerating all finite products of the matrices L and R. As 
observed earlier, the matrices are naturally visualised as a tree —recall fig. 
CD — with left branching corresponding to multiplying (on the right) by L and 
right branching to multiplying (on the right) by R. 

By premultiplying each matrix in the tree by (1 1), we get a tree of 
rationals. (Premultiplying by (1 1) is accomplished by adding the elements in 
each column.) This tree is sometimes called the Calkin-Wilf tree BQQ; 
we call it the Eisenstein-Stern tree of rationals. (See the appendix for an 
explanation.) The first four levels of the tree are shown in fig. [21 In this 
figure, the vector (x y) has been displayed as -. (Note the order of x and y. 
This is to aid comparison with existing literature.) 

By postmultiplying each matrix in the tree by (*), we also get a tree of 
rationals. (Postmultiplying by (*) is accomplished by adding the elements 
in each row.) This tree is called the Stern-Brocot tree [lo[ pp. 116-118]. See 


fig. [3j In this figure, the vector (^J has been displayed as |. 

Of course, if we can End an efficient way of enumerating the matrices in 
fig- CD we immediately get an enumeration of the rationals as displayed in the 
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Figure 3: Stern-Brocot Tree of Rationals 


Eisenstein-Stern tree and as displayed in the Stern-Brocot tree — as each 
matrix is enumerated, simply premultiply by (1 1) or postmultiply by (*). 
Formally, the matrices are enumerated by enumerating all strings of Ls and 
Rs in lexicographic order, beginning with the empty string; each string is 
mapped to a matrix by the homomorphism that maps “L” to L, “R” to R, 
and string concatenation to matrix product. It is easy to enumerate all such 
strings; as we see shortly, converting strings to matrices is also not difficult, 
for the simple reason that L and R are invertible. 

The enumeration proceeds level-by-levcl. Beginning with the unit matrix 
(level 0), the matrices on each level are enumerated from left to right. There 
are 2 k matrices on level k, the first of which is iA The problem is to deter¬ 
mine for a given matrix, which is the matrix “adjacent” to it. That is, given 
a matrix D, which is a Enite product of L and R, and is different from R A 
for all k, what is the matrix that is to the immediate right of D in fig. [T|? 

Consider the lexicographic ordering on strings of Ls and Rs of the same 
length. The string immediately following a string s (that is not the last) is 
found by identifying the rightmost L in s. Supposing s is the string tLR J , 
where R J is a string of j Rs, its successor is fRLL 

It’s now easy to see how to transform the matrix identihed by s to its suc- 
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cessor matrix. Simply postmultiply by R ~ j x L' 1 x R x LA This is because, 
for all T and j, 

(TxLx R j ) x (R- j x L _1 x R x L j ) = T x R x V . 

Also, it is easy to calculate R ~ j x L _1 x R x LA Specifically, 

R j x L' 1 x R x V = ( 2j + 1 1 

v -1 0 

(We omit the details. Briefly, by induction, equals (J Also, R is the 
transpose of L.) 

The final task is to determine, given a matrix D, which is a finite product 
of Ls and Rs, and is different from R^ for all fc, the unique value j such that 
D = T xLxR J for some T. This can be determined by examining Euclid’s 
algorithm once more. 

The matrix form of Euclid’s algorithm discussed in theorem I3T1 computes 
a matrix D given a pair of positive numbers m and n; it maintains the 
invariant 

{m n) = (i!/)xD . 

D is initially the identity matrix and x and y are initialised to m and n, 
respectively; immediately following the initialisation process, D is repeatedly 
premultiplied by R so long as x is less than y. Simultaneously, y is reduced 
by x. The number of times that D is premultiplied by R is thus the greatest 
number j such that jxm is less than n, which is • Now suppose the 

input values m and n are coprime. Then, on termination of the algorithm, 
(1 1) x D equals (m n). That is, if 

Doo Doi\ 

D 10 DiJ ’ 

then, 


n —1 


Doi + Dn — 1 

m 


Dqo + Dio 


It remains to decide how to keep track of the levels in the tree. For this 
purpose, it is not necessary to maintain a counter. It suffices to observe that 
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D is a power of R exactly when the rationals in the Eisenstein-Stern, or 
Stern-Brocot, tree are integers, and this integer is the number of the next 
level in the tree (where the root is on level 0). So, it is easy to test whether 
the last matrix on the current level has been reached. Equally, the first 
matrix on the next level is easily calculated. For reasons we discuss in the 
next section, we choose to test whether the rational in the Eisenstein-Stern 
tree is an integer; that is, we evaluate the boolean D 0 o + D 10 = 1. In this 
way, we get the following (non-terminating) program which computes the 
successive values of D. 

D := I; 

do Doo + Dio = 1 —> D 
1=1 D 00 + D 10 1 y j : 

D : 

od 

A minor simplification of this algorithm is that the 1” in the assignment 
to j can be omitted. This is because and are equal when m and 

n are coprime and m is different from 1. We return to this shortly. 

4-2.3. The Enumerations 

As remarked earlier, we immediately get an enumeration of the rationals 
as displayed in the Eisenstein-Stern tree and as displayed in the Stern-Brocot 
tree — as each matrix is enumerated, simply premultiply by (1 1) or post- 
multiply by (j), respectively. 

In the case of enumerating the Eisenstein-Stern tree, several optimisations 
are possible. First, it is immediate from our derivation that the value assigned 
to the local variable j is a function of (1 1) x D. In turn, the matrix ( 2j J 1 l 1 J) 
is also a function of (1 1) x D. Let us name the function J, so that the 
assignment becomes 

D := D x J.((l 1) x D) . 

Then, the Eisenstein-Stern enumeration iteratively evaluates 

(1 1) x (D x J.((l 1) x D)) . 


1 o 

Doi+Dn 1 

Dpi + Du — 1 
Dpo + Dio 


= Dx( 2 > y;) 
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Matrix multiplication is associative; so this is 


((1 1) x D) x J.((l 1) x D) , 


which is also a function of (1 1) x D. Moreover —in anticipation of the 
current discussion— we have been careful to ensure that the test for a change 
in the level in the tree is also a function of (1 1) x D. Combined together, this 
means that, in order to enumerate the rationals in Eisenstein-Stern order, it is 
not necessary to compute D at each iteration, but only (1 1) x D. Naming 
the two components of this vector m and n, and simplifying the matrix 
multiplications, we gei@ 


m,n : = 1,1 ; 
do m — 1 —>- m,n 

□ 771 j^l —>■ 771,71 

od 


n+1, m 
77—1 


(2 


m 


+ 1) x m — n 


m 


At this point, a further simplification is also possible. We remarked earlier 
that [ n ~ 1 j equals [—J when m and n are coprime and m is different from 
1. By good fortune, it is also the case that (2 +1) x m — n simplifies to 

n+1 when 77i is equal to 1. That is, the elimination of 1” in the evaluation 
of the floor function leads to the elimination of the entire case analysis! This 
is the algorithm attributed to Newman in 


12 


77i,7i := 1,1 ; 
do m,n : = 

od 



+ 1) X 771 — n 


771 


4.24. Discussion 

Our construction of an algorithm for enumerating the rationals in Stern- 
Brocot order was motivated by reading two publications, [10], pp. 116-118] 


and 11]. Gibbons, Lester and Bird mi show how to enumerate the elements 
of the Eisenstein-Stern tree, but claim that “it is not at all obvious how to 
do this for the Stern-Brocot tree”. Specifically, they saj0: 


9 Recall that, to comply with existing literature, the enumerated rational is — and not 

m 

n 

10 Recall that they attribute the tree to Calkin and Wilt rather than Eisenstein and 
Stern. 
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However, there is an even better compensation for the loss of the 
ordering property in moving from the Stern-Brocot to the Calkin- 
Wilf tree: it becomes possible to deforest the tree altogether, and 
generate the rationals directly, maintaining no additional state 
beyond the ‘current’ rational. This startling observation is due 
to Moshe Newman (Newman, 2003). In contrast, it is not at all 
obvious how to do this for the Stern-Brocot tree; the best we can 
do seems to be to deforest the tree as far as its levels, but this 
still entails additional state of increasing size. 


In this section, we have shown that it is possible to enumerate the rationals 
in Stern-Brocot order without incurring “additional state of increasing size”. 
More importantly, we have presented one enumeration algorithm with two 
specialisations, one being the “Calkin-Wilf” enumeration they present, and 
the other being the Stern-Brocot enumeration that they described as being 
“not at all obvious”. 

The optimisation of Eisenstein-Stern enumeration which leads to New¬ 
man’s algorithm is not possible for Stern-Brocot enumeration. Nevertheless, 
the complexity of Stern-Brocot enumeration is the same as the complexity 
of Newman’s algorithm, both in time and space. The only disadvantage of 
Stern-Brocot enumeration is that four variables are needed in place of two; 
the advantage is the (well-known) advantage of the Stern-Brocot tree over 
the Eisenstein-Stern tree — the rationals on a given level are in ascending 
order. 

Gibbons, Lester and Bird’s goal seems to have been to show how the 
functional programming language Haskell implements the various construc¬ 
tions - the construction of the tree structures and Newman’s algorithm. In 
doing so, they repeat the existing mathematical presentations of the algo¬ 
rithms as given in 00 [lij. The ingredients for an efficient enumeration 


of the Stern-Brocot tree are all present in these publications, but the recipe 
is missing! 

The fact that expressing the rationals in “lowest form” is essential to 
the avoidance of duplication in any enumeration immediately suggests the 
relevance of Euclid’s algorithm. The key to our exposition is that Euclid’s 
algorithm can be expressed in terms of matrix multiplications, where - 
significantly— the underlying matrices are invertible. Transposition and in¬ 
version of the matrices capture the symmetry properties in a precise, calcu- 
lational framework. As a result, the bijection between the rationals and the 


32 




tree elements is immediate and we do not need to give separate, inductive 
proofs for both tree structures. Also, the determination of the next element 
in an enumeration of the tree elements has been reduced to one unifying 
construction. 

5. Conclusion 

In our view, much of mathematics is inherently algorithmic; it is also clear 
that, in the modern age, algorithmic problem solving is just as important, 
if not much more so, than in the 19th century. Somehow, however, mathe¬ 
matical education in the 20th century lost sight of its algorithmic roots. We 
hope to have exemplified in this paper how a fresh approach to introductory 
number theory that focuses on the algorithmic content of the theory can com¬ 
bine practicality with mathematical elegance. By continuing this endeavour 
we believe that the teaching of mathematics can be enriched and given new 
vigour. 
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Appendix: Historical Remarks 


The primary novel result of our paper is the construction given in section 
14.21 of an algorithm to enumerate the rationals in Stern-Brocot order. Apart 
from minor differences, this section of our paper was submitted in April 
2007 to the American Mathematical Monthly; it was rejected in November 
2007 on the grounds that it was not of sufficient interest to readers of the 
Monthly. One (of two referees) did, however, recommend publication. The 
referee made the following general comment. 


Each of the two trees of rationals—the Stern-Brocot tree and the 
Calkin-Wilf tree—has some history. Since this paper now gives 
the definitive link between these trees, I encourage the authors, 
perhaps in their Discussion section, to also give the definitive his¬ 
tories of these trees, something in the same spirit as the Remarks 
at the end of the Calkin and Wilf paper. 


Since the publication of [16], we have succeeded in obtaining copies of the 
original papers and it is indeed interesting to briefly review the papers. But 
we do not claim to provide “definitive histories of these trees” — that is a 
task for a historian of mathematics. 

Section [6] is about the paper ]j| published in 1858 by Stern. The surpris¬ 
ing fact that emerges from the review is that the so-called “Calkin-Wilf” tree 
of rationals, and not just the “Stern-Brocot” tree, is studied in detail in his 
paper. Moreover, of the two structures, the “Calkin-Wilf” tree is more read¬ 
ily recognised; the “Stern-Brocot” tree requires rather more understanding 


to identify. Brocot’s paper 17], which we review in section [7] is interesting 
because it illustrates how 19th century mathematics was driven by practical, 
algorithmic problems. (For additional historical remarks, see also [18].) 


6. Stern’s Paper 

Earlier we have commented that the structure that has recently been 
referred to as the “Calkin-Wilf” tree was documented by Stern [jlf| in 1858. 
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In this section we review those sections of Stern’s paper that are relevant to 
our own. 


6.1. The Eisenstein Array 

Stern’s paper is a detailed study of what has now become known as the 


; ‘Eisenstein array” of numbers (see, for example, 19J, sequence A064881]). 


(Stern’s paper cites two papers written by the more famous mathematician 
Gotthold Eisenstein; we have not read these papers.) Given two natural 
numbers m and n, Stern describes a process (which he attributes to Eisen¬ 
stein) of generating an infinite sequence of rows of numbers. The zeroth row 
in the sequence (“nullte Entwickelungsreihe”) is the given pair of numbers: 


m n . 

Subsequent rows are obtained by inserting between every pair of numbers 
the sum of the numbers. Thus the first row is 

m m+n n 

and the second row is 


m 2xm + n m+n m + 2xn 


n 


The process of constructing such rows is repeated indefinitely. The sequence 
of numbers obtained by concatenating the individual rows in order is what is 
now called the Eisenstein array and denoted by Ei(m,n ) (see, for example, 
191 sequence A064881]) . Stern refers to each occurrence of a number in 
rows other than the zeroth row as either a sum element (“Summenglicd”) or 
a source element (“Stammglied”). The sum elements are the newly added 
numbers. For example, in the first row the number m+n is a sum element; 
in the second row the number m+n is a source element. 


6.2. The Eisenstein-Stern Tree of Rationals 

A central element of Stern’s analysis of the Eisenstein array is the consid¬ 
eration of subsequences of numbers in individual rows. He calls these groups 
(“Gruppen”) and he records the properties of pairs of consecutive numbers 
(groups of size two — “zweigliedrige Gruppen”) and triples of consecutive 
numbers (groups of size three — “dreigliedrige Gruppen”). 

In sections 5 thru 8 of his paper, Stern studies Ei( 1,1), the Eisenstein 
array that begins with the pair (1,1). He proves that all pairs of consecutive 
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numbers in a given row are coprime and every pair of coprime numbers 
appears exactly once as such a pair of consecutive numbers. He does not use 
the word “tree” —tree structures are most probably an invention of modern 
computing science— and he does not refer to “rational numbers” —he refers 
instead to relatively prime numbers (“relatieve Primzahlen”)— but there is 
no doubt that, apart from the change in terminology, he describes the tree 
of rationals that in recent years has been referred to as the “Calkin-Wilf” 
tree of rationals. It is for this reason that we believe it is misleading to use 
the name “Calkin-Wilf tree” and prefer to use the name “Eisenstein-Stern 
tree”. Fig. [4] shows the first four rows of Ei( 1,1) and fig. [5] shows all pairs of 
consecutive numbers for each of the four rows. The pairs have been arranged 
so that the correspondence between Eg. [2] and fig. [3 is clear. 

1 1 


1 2 1 

1 3 2 3 1 

143525341 


Figure 4: First four rows of Ei( 1,1) 


(hi) 

( 1 , 2 ) ( 2 , 1 ) 

(1,3) (3,2) (2,3) (3,1) 

(1,4) (4,3) (3,5) (5,2) (2,5) (5,3) (3,4) (4,1) 

Figure 5: Pairs of consecutive numbers in the first four rows of Ei( 1,1) 

Other sections of Stern’s paper record additional properties of the tree, 
which we do not discuss here. For example, Stern discusses how often each 
number appears as a sum number. 
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6.3. The Stern-Brocot Tree of Rationals 


Identification of the so-called Stern-Brocot tree of rationals in Stern’s 
paper is more demanding. Recall the process of constructing a sequence of 
rows of numbers from a given pair of numbers m and n. It is clear that every 
number is a linear combination of m and n. Stern studies the coefficients 
(“Coefficienten”), i.e. the pair of multiplicative factors of m and n, defined by 
the linear combination. Fig. [6] displays the coefficients in a way that allows 
direct comparison with the Stern-Brocot tree of rationals (fig. [3]). (The 
reader may also wish to compare fig. EH with Graham, Knuth and Patashnik’s 


depiction of the tree 10|, p. 117].) 


1 xm + Oxn. 


Oxm + lxn 


1 xm + lxn 



4xm + lxn 3xm + 2xn 5xm + 2xn 3xm + 5xn 5xm + 3xn 2xm + 5xn 4xm + 3xn lxm + 4xn 


Figure 6: Tree of “coefficients” of Ei(m,n) 


The numbers at the top-left and top-right of fig. [(2 are the numbers m and 
n written as lxm + Oxn and Oxm + lxn, respectively, in order to make the 
coefficients clear. This, we recall, is the zeroth row in Stern’s structure. 

In the subsequent levels of the tree, only the sum elements are displayed. 
The correspondence between fig. [3] and fig. [3] should be easy to see; the 
number kxm + lxn in fig. [3] is displayed as the rational | in fig. |3] The 
“fundamental fact” (4.31) in lo[ is observed by Stern 15, equation (8), p.207] 
and used immediately to infer that coefficients are relatively prime. In section 
15 of his paper, Stern uses the (already proven) fact that the Eisenstein-Stern 
tree is a tree of (all) rationals to deduce that the Stern-Brocot tree is also a 
tree of rationals. 


6 . 4 . Newman’s Algorithm 

An interesting question is whether Stern also documents the algorithm 
currently attributed to Moshe Newman for enumerating the elements of the 
Eisenstein array. This is a question we found difficult to answer because of 


38 





our limited understanding of German. However, the answer would appear to 
be: almost, but not quite! 

As remarked earlier, Stern documents a number of properties of groups 
of numbers in rows of the Eisenstein array, in particular groups of size three. 
Of course, a group of size three comprises two groups of size two. Since 
groups of size two in the Eisenstein array correspond to rationals in the 
Eisenstein-Stern tree, by studying groups of size three Stern is effectively 
studying consecutive rationals in the Eisenstein-Stern tree of rationals. 

It is important to note that Stern’s focus is the sequence of rows of num¬ 
bers (in modern terminology, the tree of numbers) as opposed to the (flat¬ 
tened) sequence of numbers defined by Ei(m,n ) — significantly, the last 
number in one row and the first number in the next row do not form a 
“group” according to Stern’s definition. This means that, so far as we have 
been able to determine, he nowhere considers a triple of numbers that crosses 
a row boundary. 

Newman’s algorithm (in the form we use in section 14. 2. 3ft predicts that 
each triple of numbers in a given row of Ei( 1,1) has the form 


(2 - + 1) x b — a 
LoJ 


(Variable names have been chosen to facilitate comparison with Stern’s pa¬ 
per.) It follows immediately that the sum of the two outer elements of the 
triple is divisible by the middle element (that is, a+ ((2 + 1) x b — a) 

is divisible by 6); this fact is observed by Stern (for triples in a given row) 
in section 4 of his paper. Importantly for what follows, Stern observes that 
the property holds for Ei(m,n ) for arbitrary natural numbers m and n, and 
not just Ei( 1,1). Stern observes further 
Ei(m,n ) has the form 


151 . (4) p. 198] that each triple in 


(32) a b (2t + l)x6 — a 


for some number t. Stern identifies t as the number of rows preceding the 
current row in which the number b occurs as a sum element. (In particular, if 
b is a sum element then t equals 0.) Stern shows how to calculate t from the 
position of b in the row — effectively by expressing the position as a binary 
numeral. (Note that “i” is the variable name used in Stern’s paper; it has 
the same role as the variable “ in our derivation of the algorithm in section 

BZ2I) 
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So far as we have been able to determine, Stern does not explicitly re¬ 
mark that t equals [|J in the case of Ei( 1,1), but he does so implicitly in 
section 10 where he relates the the continued fraction representation of | 
to the row number in which the pair (a, b ) occurs. He does not appear to 
suggest a similar method for computing t in the general case of enumerating 
Ei(m,ri). However, it is straightforward to combine our derivation of New¬ 
man’s algorithm with Stern’s theorems to obtain an algorithm to enumerate 
the elements of Ei(m,n ) for arbitrary natural numbers m and n. Interested 
readers may consult our website (2o| where several implementations are dis¬ 
cussed. 

As stated at the beginning of this section, the conclusion is that Stern 
almost derives Newman’s algorithm, but not quite. On the other hand, 
because his analysis is of the general case Ei(m,n ) as opposed to Ei( 1,1), his 
results are more general. 

6.5. Stern-Brocot Enumeration 

We now turn to the question whether Stern also gives an algorithm for 
enumerating the rationals in Stern-Brocot order. 

To this end, we observe that the form (j32|) extends to the coefficients of 
each element of Ei(M,N), and hence to the elements of the Stern-Brocot 
tree. Specifically, triples in Ei(M,N) have the form 

n 0 M+m 0 N n^M+miN ((2k + l)ni — n 0 )M + ((2k + l)mi — m 0 )N 

It is easy to exploit this formula directly to get an enumeration of the ratio¬ 
nals in Stern-Brocot order, just as we did above to obtain an enumeration of 
Ei(M,N). Just recall that the Stern-Brocot rationals are given by the co¬ 
efficients of the sum elements, and the sum elements are the odd-numbered 
elements in the rows of Ei(M,N ) (where numbering starts from zero). The 
algorithm so obtained is the one we derived in section 14.2.31 

In this sense, Stern does indeed provide an algorithm for enumerating the 
rationals in Stern-Brocot order, albeit implicitly. However, as with Newman’s 
algorithm, he fails to observe the concise formula for the value of the variable 
k. Also, a major methodological difference is our exploitation of the concision 
and precision afforded by matrix algebra. Given the state of development of 
matrix algebra in 1858, Stern cannot be criticised for not doing the same. 

Finally, we remark that Stern returns to the properties of triples in section 
19 of his paper. Unfortunately, we have been unable to fully understand this 
section. 
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7. Brocot, the Watchmaker 


Achille Brocot was a famous French watchmaker who, some years before 
the publication of his paper [17J, had to fix some pendulums used for astro¬ 


nomical measurements. However, the device was incomplete and he did not 
know how to compute the number of teeth of cogs that were missing. He 
was unable to find any literature helpful to the solution of the problem, so, 
after some experiments, he devised a method to compute the numbers. In 
his paper, Brocot illustrates his method with the following example: 


A shaft turns once in 23 minutes. We want suitable cogs so that 
another shaft completes a revolution in 3 hours and 11 minutes, 
that is 191 minutes. 


The ratio between both speeds is bh-, so we can clearly choose a cog with 
191 teeth, and another one with 23 teeth. But, as Brocot wrote, it was not 
possible, at that time, to create cogs with so many teeth. And because 191 
and 23 are coprime, cogs with fewer teeth can only approximate the true 
ratio. 

Brocot’s contribution was a method to compute approximations to the 
true ratios (hence the title of his paper, “Calculus of cogs by approximation”). 
He begins by observing that ^ must be between the ratios | and |. If we 
choose the ratio |, the error is —7 since 8x23 = 1x191 — 7. This means 
that if we choose this ratio, the slower cog completes its revolution seven 
minutes early, i.e., after 8x23 minutes. On the other hand, if we choose the 
ratio |, the error is 16 since 9x23 = 1x191 + 16, meaning that the slower 
cog completes its revolution sixteen minutes late, i.e., after 9x23 minutes. 

Accordingly, Brocot writes two rows: 


8 1 -7 


9 1 +16 

His method consists in iteratively forming a new row, by adding the numbers 
in all three columns of the rows that produce the smallest error. Initially, 
we only have two rows, so we add the numbers in the three columns and we 
write the row of sums in the middle. 


8 

17 

9 


1 

2 

1 
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-7 

+9 

+16 




(If we choose the ratio y, the slower cog completes its revolution § minutes 

later, since y = .) Further approximations are constructed by adding 

a row adjacent to the row that minimises the error term. The process ends 
once we reach the error 0, which refers to the true ratio. The final state of 
the table is: 


8 

1 

-7 

33 

4 

-5 

58 

7 

-3 

83 

10 

-1 

191 

23 

0 

108 

13 

+1 

25 

3 

+2 

17 

2 

+9 

9 

1 

+16 


The conclusion is that the two closest approximations to yy are ratios of y 
(which runs -A minutes faster) and yy (which runs A minutes slower). We 
could continue this process, getting at each stage a closer approximation to 
yy. In fact, Brocot rehnes_the table shown above, in order to construct a 
multistage cog train (see 17, p. 191]). 

At each step in Brocot’s process we add a new ratio m n ^™, , which is usually 
called the mediant of — and y. Similarly, each node in the Stern-Brocot tree 


is of the form m+m ' where — is the nearest ancestor above and to the left, 

n-\-n' 7 n 

and y is the nearest ancestor above and to the right. (Consider, for example, 


the rational | in figure |3J Its nearest ancestor above and to the left is y and 
its nearest ancestor above and to the right is |.) Brocot’s process can be 
used to construct the Stern-Brocot tree: first, create an array that contains 
initially the rationals y and A then, insert the rational yyyp- between two 
adjacent fractions — and y. In the hrst step we add only one rational to 
the array 


0 1 1 

I’T’0 ’ 


but in the second step we add two new rationals: 


0 112 1 
I’2’1’I’O 
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Generally, in the n th step we add 2 n_1 new rationals. Clearly, this array can 
be represented as an infinite binary tree, whose first four levels are repre¬ 
sented in figure [3] (we omit the fractions y and i). 

The most interesting aspect to us of Brocot’s paper is that it solves an 
algorithmic problem. Brocot was faced with the practical problem of how 
to approximate rational numbers in order to construct clocks of satisfactory 
accuracy and his solution is undisputably an algorithm. Stern’s paper is 
closer to a traditional mathematical paper but, even so, it is an in-depth 
study of an algorithm for generating rows of numbers of increasing length. 


8. Conclusion 


There can be no doubt that what has been dubbed in recent years the 
“Calkin-Wilf” tree of rationals is, in fact, a central topic in Stern’s 1858 
paper. Calkin and Wilf 13] admit that in Stern’s paper “there is a structure 
that is essentially our tree of fractions” but add “in a different garb” and 
do not clarify what is meant by “a different garb”. It is unfortunate that 
the misleading name has now become prevalent; in order to avoid further 
misinterpretations of historical fact, it would be desirable for Stern’s paper 
to be translated into English. 

We have not attempted to determine how the name “Stern-Brocot” tree 
came into existence. It has been very surprising to us how much easier it 
is to identify the Eisenstein-Stern tree in Stern’s paper in comparison to 
identifying the Stern-Brocot tree. 
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